# initial imports
import pandas as pd
import numpy as np

# Do not change this option; This allows the CodeGrade auto grading to function correctly
pd.set_option('display.max_columns', 20)


# read_csv with iso-8859-1 encoding; using latin-1 would also work here
candy_full = pd.read_csv('candy.csv', encoding='iso-8859-1')

# copy to new DF so that we can have a copy of the original import if needed
candy = candy_full.copy()
candy


candy.shape

(2479, 120)


# first five rows
candy.head()


# check info about the DataFrame
candy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2479 entries, 0 to 2478
Columns: 120 entries, Internal ID to Click Coordinates (x, y)
dtypes: float64(4), int64(1), object(115)
memory usage: 2.3+ MB


candy.shape

(2479, 120)


# check info, set max_cols
candy.info(max_cols=120)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2479 entries, 0 to 2478
Data columns (total 120 columns):
 #    Column                                                                                 Non-Null Count  Dtype  
---   ------                                                                                 --------------  -----  
 0    Internal ID                                                                            2479 non-null   int64  
 1    Q1: GOING OUT?                                                                         2368 non-null   object 
 2    Q2: GENDER                                                                             2437 non-null   object 
 3    Q3: AGE                                                                                2394 non-null   object 
 4    Q4: COUNTRY                                                                            2414 non-null   object 
 5    Q5: STATE, PROVINCE, COUNTY, ETC                                                       2377 non-null   object 
 6    Q6 | 100 Grand Bar                                                                     1728 non-null   object 
 7    Q6 | Anonymous brown globs that come in black and orange wrappers	(a.k.a. Mary Janes)  1741 non-null   object 
 8    Q6 | Any full-sized candy bar                                                          1803 non-null   object 
 9    Q6 | Black Jacks                                                                       1517 non-null   object 
 10   Q6 | Bonkers (the candy)                                                               1482 non-null   object 
 11   Q6 | Bonkers (the board game)                                                          1469 non-null   object 
 12   Q6 | Bottle Caps                                                                       1709 non-null   object 
 13   Q6 | Box'o'Raisins                                                                     1787 non-null   object 
 14   Q6 | Broken glow stick                                                                 1767 non-null   object 
 15   Q6 | Butterfinger                                                                      1793 non-null   object 
 16   Q6 | Cadbury Creme Eggs                                                                1792 non-null   object 
 17   Q6 | Candy Corn                                                                        1796 non-null   object 
 18   Q6 | Candy that is clearly just the stuff given out for free at restaurants            1784 non-null   object 
 19   Q6 | Caramellos                                                                        1723 non-null   object 
 20   Q6 | Cash, or other forms of legal tender                                              1795 non-null   object 
 21   Q6 | Chardonnay                                                                        1732 non-null   object 
 22   Q6 | Chick-o-Sticks (we donÕt know what that is)                                       1528 non-null   object 
 23   Q6 | Chiclets                                                                          1764 non-null   object 
 24   Q6 | Coffee Crisp                                                                      1621 non-null   object 
 25   Q6 | Creepy Religious comics/Chick Tracts                                              1771 non-null   object 
 26   Q6 | Dental paraphenalia                                                               1783 non-null   object 
 27   Q6 | Dots                                                                              1746 non-null   object 
 28   Q6 | Dove Bars                                                                         1773 non-null   object 
 29   Q6 | Fuzzy Peaches                                                                     1652 non-null   object 
 30   Q6 | Generic Brand Acetaminophen                                                       1744 non-null   object 
 31   Q6 | Glow sticks                                                                       1778 non-null   object 
 32   Q6 | Goo Goo Clusters                                                                  1595 non-null   object 
 33   Q6 | Good N' Plenty                                                                    1741 non-null   object 
 34   Q6 | Gum from baseball cards                                                           1759 non-null   object 
 35   Q6 | Gummy Bears straight up                                                           1778 non-null   object 
 36   Q6 | Hard Candy                                                                        1780 non-null   object 
 37   Q6 | Healthy Fruit                                                                     1781 non-null   object 
 38   Q6 | Heath Bar                                                                         1763 non-null   object 
 39   Q6 | Hershey's Dark Chocolate                                                          1802 non-null   object 
 40   Q6 | HersheyÕs Milk Chocolate                                                          1803 non-null   object 
 41   Q6 | Hershey's Kisses                                                                  1797 non-null   object 
 42   Q6 | Hugs (actual physical hugs)                                                       1762 non-null   object 
 43   Q6 | Jolly Rancher (bad flavor)                                                        1781 non-null   object 
 44   Q6 | Jolly Ranchers (good flavor)                                                      1780 non-null   object 
 45   Q6 | JoyJoy (Mit Iodine!)                                                              1449 non-null   object 
 46   Q6 | Junior Mints                                                                      1777 non-null   object 
 47   Q6 | Senior Mints                                                                      1532 non-null   object 
 48   Q6 | Kale smoothie                                                                     1731 non-null   object 
 49   Q6 | Kinder Happy Hippo                                                                1528 non-null   object 
 50   Q6 | Kit Kat                                                                           1801 non-null   object 
 51   Q6 | LaffyTaffy                                                                        1739 non-null   object 
 52   Q6 | LemonHeads                                                                        1745 non-null   object 
 53   Q6 | Licorice (not black)                                                              1789 non-null   object 
 54   Q6 | Licorice (yes black)                                                              1790 non-null   object 
 55   Q6 | Lindt Truffle                                                                     1757 non-null   object 
 56   Q6 | Lollipops                                                                         1784 non-null   object 
 57   Q6 | Mars                                                                              1750 non-null   object 
 58   Q6 | Maynards                                                                          1450 non-null   object 
 59   Q6 | Mike and Ike                                                                      1746 non-null   object 
 60   Q6 | Milk Duds                                                                         1782 non-null   object 
 61   Q6 | Milky Way                                                                         1787 non-null   object 
 62   Q6 | Regular M&Ms                                                                      1800 non-null   object 
 63   Q6 | Peanut M&MÕs                                                                      1804 non-null   object 
 64   Q6 | Blue M&M's                                                                        1748 non-null   object 
 65   Q6 | Red M&M's                                                                         1746 non-null   object 
 66   Q6 | Green Party M&M's                                                                 1711 non-null   object 
 67   Q6 | Independent M&M's                                                                 1662 non-null   object 
 68   Q6 | Abstained from M&M'ing.                                                           1532 non-null   object 
 69   Q6 | Minibags of chips                                                                 1751 non-null   object 
 70   Q6 | Mint Kisses                                                                       1699 non-null   object 
 71   Q6 | Mint Juleps                                                                       1664 non-null   object 
 72   Q6 | Mr. Goodbar                                                                       1735 non-null   object 
 73   Q6 | Necco Wafers                                                                      1731 non-null   object 
 74   Q6 | Nerds                                                                             1752 non-null   object 
 75   Q6 | Nestle Crunch                                                                     1777 non-null   object 
 76   Q6 | Now'n'Laters                                                                      1657 non-null   object 
 77   Q6 | Peeps                                                                             1765 non-null   object 
 78   Q6 | Pencils                                                                           1766 non-null   object 
 79   Q6 | Pixy Stix                                                                         1753 non-null   object 
 80   Q6 | Real Housewives of Orange County Season 9 Blue-Ray                                1722 non-null   object 
 81   Q6 | ReeseÕs Peanut Butter Cups                                                        1796 non-null   object 
 82   Q6 | Reese's Pieces                                                                    1784 non-null   object 
 83   Q6 | Reggie Jackson Bar                                                                1460 non-null   object 
 84   Q6 | Rolos                                                                             1761 non-null   object 
 85   Q6 | Sandwich-sized bags filled with BooBerry Crunch                                   1699 non-null   object 
 86   Q6 | Skittles                                                                          1769 non-null   object 
 87   Q6 | Smarties (American)                                                               1750 non-null   object 
 88   Q6 | Smarties (Commonwealth)                                                           1573 non-null   object 
 89   Q6 | Snickers                                                                          1785 non-null   object 
 90   Q6 | Sourpatch Kids (i.e. abominations of nature)                                      1737 non-null   object 
 91   Q6 | Spotted Dick                                                                      1593 non-null   object 
 92   Q6 | Starburst                                                                         1782 non-null   object 
 93   Q6 | Sweet Tarts                                                                       1767 non-null   object 
 94   Q6 | Swedish Fish                                                                      1760 non-null   object 
 95   Q6 | Sweetums (a friend to diabetes)                                                   1472 non-null   object 
 96   Q6 | Take 5                                                                            1557 non-null   object 
 97   Q6 | Tic Tacs                                                                          1761 non-null   object 
 98   Q6 | Those odd marshmallow circus peanut things                                        1739 non-null   object 
 99   Q6 | Three Musketeers                                                                  1767 non-null   object 
 100  Q6 | Tolberone something or other                                                      1769 non-null   object 
 101  Q6 | Trail Mix                                                                         1767 non-null   object 
 102  Q6 | Twix                                                                              1785 non-null   object 
 103  Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein            1683 non-null   object 
 104  Q6 | Vicodin                                                                           1686 non-null   object 
 105  Q6 | Whatchamacallit Bars                                                              1652 non-null   object 
 106  Q6 | White Bread                                                                       1718 non-null   object 
 107  Q6 | Whole Wheat anything                                                              1728 non-null   object 
 108  Q6 | York Peppermint Patties                                                           1770 non-null   object 
 109  Q7: JOY OTHER                                                                          917 non-null    object 
 110  Q8: DESPAIR OTHER                                                                      723 non-null    object 
 111  Q9: OTHER COMMENTS                                                                     389 non-null    object 
 112  Q10: DRESS                                                                             1730 non-null   object 
 113  Unnamed: 113                                                                           9 non-null      object 
 114  Q11: DAY                                                                               1750 non-null   object 
 115  Q12: MEDIA [Daily Dish]                                                                85 non-null     float64
 116  Q12: MEDIA [Science]                                                                   1376 non-null   float64
 117  Q12: MEDIA [ESPN]                                                                      99 non-null     float64
 118  Q12: MEDIA [Yahoo]                                                                     67 non-null     float64
 119  Click Coordinates (x, y)                                                               1619 non-null   object 
dtypes: float64(4), int64(1), object(115)
memory usage: 2.3+ MB


# print a list of column names
for col in candy.columns:
    print(col)

Internal ID
Q1: GOING OUT?
Q2: GENDER
Q3: AGE
Q4: COUNTRY
Q5: STATE, PROVINCE, COUNTY, ETC
Q6 | 100 Grand Bar
Q6 | Anonymous brown globs that come in black and orange wrappers	(a.k.a. Mary Janes)
Q6 | Any full-sized candy bar
Q6 | Black Jacks
Q6 | Bonkers (the candy)
Q6 | Bonkers (the board game)
Q6 | Bottle Caps
Q6 | Box'o'Raisins
Q6 | Broken glow stick
Q6 | Butterfinger
Q6 | Cadbury Creme Eggs
Q6 | Candy Corn
Q6 | Candy that is clearly just the stuff given out for free at restaurants
Q6 | Caramellos
Q6 | Cash, or other forms of legal tender
Q6 | Chardonnay
Q6 | Chick-o-Sticks (we donÕt know what that is)
Q6 | Chiclets
Q6 | Coffee Crisp
Q6 | Creepy Religious comics/Chick Tracts
Q6 | Dental paraphenalia
Q6 | Dots
Q6 | Dove Bars
Q6 | Fuzzy Peaches
Q6 | Generic Brand Acetaminophen
Q6 | Glow sticks
Q6 | Goo Goo Clusters
Q6 | Good N' Plenty
Q6 | Gum from baseball cards
Q6 | Gummy Bears straight up
Q6 | Hard Candy
Q6 | Healthy Fruit
Q6 | Heath Bar
Q6 | Hershey's Dark Chocolate
Q6 | HersheyÕs Milk Chocolate
Q6 | Hershey's Kisses
Q6 | Hugs (actual physical hugs)
Q6 | Jolly Rancher (bad flavor)
Q6 | Jolly Ranchers (good flavor)
Q6 | JoyJoy (Mit Iodine!)
Q6 | Junior Mints
Q6 | Senior Mints
Q6 | Kale smoothie
Q6 | Kinder Happy Hippo
Q6 | Kit Kat
Q6 | LaffyTaffy
Q6 | LemonHeads
Q6 | Licorice (not black)
Q6 | Licorice (yes black)
Q6 | Lindt Truffle
Q6 | Lollipops
Q6 | Mars
Q6 | Maynards
Q6 | Mike and Ike
Q6 | Milk Duds
Q6 | Milky Way
Q6 | Regular M&Ms
Q6 | Peanut M&MÕs
Q6 | Blue M&M's
Q6 | Red M&M's
Q6 | Green Party M&M's
Q6 | Independent M&M's
Q6 | Abstained from M&M'ing.
Q6 | Minibags of chips
Q6 | Mint Kisses
Q6 | Mint Juleps
Q6 | Mr. Goodbar
Q6 | Necco Wafers
Q6 | Nerds
Q6 | Nestle Crunch
Q6 | Now'n'Laters
Q6 | Peeps
Q6 | Pencils
Q6 | Pixy Stix
Q6 | Real Housewives of Orange County Season 9 Blue-Ray
Q6 | ReeseÕs Peanut Butter Cups
Q6 | Reese's Pieces
Q6 | Reggie Jackson Bar
Q6 | Rolos
Q6 | Sandwich-sized bags filled with BooBerry Crunch
Q6 | Skittles
Q6 | Smarties (American)
Q6 | Smarties (Commonwealth)
Q6 | Snickers
Q6 | Sourpatch Kids (i.e. abominations of nature)
Q6 | Spotted Dick
Q6 | Starburst
Q6 | Sweet Tarts
Q6 | Swedish Fish
Q6 | Sweetums (a friend to diabetes)
Q6 | Take 5
Q6 | Tic Tacs
Q6 | Those odd marshmallow circus peanut things
Q6 | Three Musketeers
Q6 | Tolberone something or other
Q6 | Trail Mix
Q6 | Twix
Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein
Q6 | Vicodin
Q6 | Whatchamacallit Bars
Q6 | White Bread
Q6 | Whole Wheat anything
Q6 | York Peppermint Patties
Q7: JOY OTHER
Q8: DESPAIR OTHER
Q9: OTHER COMMENTS
Q10: DRESS
Unnamed: 113
Q11: DAY
Q12: MEDIA [Daily Dish]
Q12: MEDIA [Science]
Q12: MEDIA [ESPN]
Q12: MEDIA [Yahoo]
Click Coordinates (x, y)


# Identify the columns with special character Õ 
 #We find one column under Q6 | HersheyÕs Milk Chocolate

candy.filter(like="col").select_dtypes("object").columns

#This gives us the following, before we replace

#Index(['Q6 | Hershey's Dark Chocolate', 'Q6 | HersheyÕs Milk Chocolate'], dtype='object')

Index(['Q6 | Hershey's Dark Chocolate', 'Q6 | HersheyÕs Milk Chocolate'], dtype='object')


# Now we rename Q6 | HersheyÕs Milk Chocolate column as Q6_new | 'Hershey's Milk Chocolate'
# And in a way we replace now the special character Õ to an apostrophe

candy = candy.rename(columns={"Q6 | HersheyÕs Milk Chocolate": "Q6_new | Hershey's Milk Chocolate"})  ### ENTER CODE HERE ###
candy


#Check 1: We get the following change, after renaming and replacing 

candy.filter(like="col").select_dtypes("object").columns

Index(['Q6 | Hershey's Dark Chocolate', 'Q6_new | Hershey's Milk Chocolate'], dtype='object')


#Check#2 Looking into this dataset
candy["Q6 | Hershey's Dark Chocolate"]

0       NaN
1       JOY
2       NaN
3       MEH
4       JOY
       ... 
2474    JOY
2475    MEH
2476    JOY
2477    NaN
2478    JOY
Name: Q6 | Hershey's Dark Chocolate, Length: 2479, dtype: object


#Number of duplicated rows
Q1 = candy.duplicated().sum()      
Q1

17


#Number of duplicated rows with same Internal ID number
Q2 = candy[candy.duplicated('Internal ID')]          
Q2 = len(Q2)
Q2

19


##Drop any duplicates from the candy DataFrame

#subset is used to remove duplicates from specific column
#To remove duplicates from entire dataset, we use: df.drop_duplicates()
# keep option is set to ‘first’ to remove duplicates and keep the first occurrences only
# keep option is set to False to remove all the occurrences of duplicate column(s)

candy = candy.drop_duplicates(subset = 'Internal ID', keep='first') 
candy


#Remove the following multiple columns from the candy DataFrame:

candy = candy.drop(
    columns=["Internal ID","Q5: STATE, PROVINCE, COUNTY, ETC",
             "Q7: JOY OTHER", "Q8: DESPAIR OTHER", "Q9: OTHER COMMENTS", 
             "Unnamed: 113", "Click Coordinates (x, y)"], axis = 1)   

candy


#Code check
candy.shape

(2460, 113)


# view value counts for Q2: GENDER
candy['Q2: GENDER'].value_counts()

Male                  1466
Female                 839
I'd rather not say      83
Other                   30
Name: Q2: GENDER, dtype: int64


# We use this to create True/False mask
candy['Q2: GENDER'].isnull()

0        True
1       False
2       False
3       False
4       False
        ...  
2474    False
2475    False
2476    False
2477    False
2478    False
Name: Q2: GENDER, Length: 2460, dtype: bool


# Number of missing values from data
Q3 = candy['Q2: GENDER'].isnull().sum()        
Q3

42


# Remove all rows with a missing value in the Q2: GENDER column

candy = candy.dropna(subset=['Q2: GENDER'])


#Candy shape                  #from 2460 to 2418 = 2460-42
candy.shape

(2418, 113)


#Code check
candy['Q2: GENDER'].isnull().sum()

0


# Select only the rows that contain either Male or Female in the Q2: GENDER column
candy = candy[candy['Q2: GENDER'].isin(['Male', 'Female'])]
candy


#Code check
candy.shape

(2305, 113)


Q4 = candy['Q1: GOING OUT?'].isnull().sum()       #77 
Q4

77


#Fill all missing values in the Q1: GOING OUT? column with a 'No' value.
candy['Q1: GOING OUT?'].fillna('No', inplace=True)


#Code check  #total No + Yes = 2305

candy['Q1: GOING OUT?'].value_counts()

No     2007
Yes     298
Name: Q1: GOING OUT?, dtype: int64


#code check for missing values
candy['Q1: GOING OUT?'].isnull().sum()

0


#Select all columns going from: Q6 | 100 Grand Bar to Q11: DAY
# Save this slice as Q5
Q5 = candy.loc[:,'Q6 | 100 Grand Bar':'Q11: DAY']
Q5


#Fill any missing values in the candy DataFrame for those columns 
# (going from Q6 | 100 Grand Bar to Q11: DAY) with the string NO_ANSWER
#I call this new df as Q51, to latter help us join with anther new df

Q51 = candy.loc[:,'Q6 | 100 Grand Bar':'Q11: DAY'].fillna('NO_ANSWER')
Q51


#For all four Q12: Media columns in the candy DataFrame
#Fill the missing values with 0.0
#Let's name this new df as Q52

candy.update(candy[['Q12: MEDIA [Daily Dish]', 
                    'Q12: MEDIA [Science]', 
                    'Q12: MEDIA [ESPN]', 
                    'Q12: MEDIA [Yahoo]']].fillna(0))


#1/ Code check for Q12 columns 

     #After First checking, I name this new df as Q52

Q52 = candy[['Q12: MEDIA [Daily Dish]',
             'Q12: MEDIA [Science]',
             'Q12: MEDIA [ESPN]',
             'Q12: MEDIA [Yahoo]']]
Q52


#Now we combine Q51 and Q52 together,and check all missed values filled/replaced
#We use the concat method to join both DFs side by side(hstack)

# We save this new df in candy1 t

candy1 = pd.concat([Q51, Q52], axis=1)
candy1


# Code check for missing values: 'Q6 | 100 Grand Bar':'Q12: Media'

candy1.isnull().sum()

Q6 | 100 Grand Bar                                                                        0
Q6 | Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)    0
Q6 | Any full-sized candy bar                                                             0
Q6 | Black Jacks                                                                          0
Q6 | Bonkers (the candy)                                                                  0
                                                                                         ..
Q11: DAY                                                                                  0
Q12: MEDIA [Daily Dish]                                                                   0
Q12: MEDIA [Science]                                                                      0
Q12: MEDIA [ESPN]                                                                         0
Q12: MEDIA [Yahoo]                                                                        0
Length: 109, dtype: int64


#Let's assign the rest column a new df: cand2
candy2 = candy.loc[:,'Q1: GOING OUT?':'Q4: COUNTRY']
candy2


# Now let's combine candy1, and candy2 to get modified new candy dataset
# We save it in candy DataFrame

candy = pd.concat([candy2, candy1], axis=1)
candy


# check unique values, in Q4: COUNTRY column

candy['Q4: COUNTRY'].unique()

array(['USA ', 'USA', 'us', 'usa', nan, 'canada', 'Canada', 'Us', 'US',
       'Murica', 'United States', 'uk', 'United Kingdom', 'united states',
       'Usa', 'United States ', 'United staes',
       'United States of America', 'UAE', 'England', 'UK', 'canada ',
       'United states', 'u.s.a.', '35', 'france',
       'United States of America ', 'america', 'U.S.A.', 'finland',
       'unhinged states', 'Mexico', 'Canada ', 'united states of america',
       'US of A', 'The United States', 'North Carolina ', 'Unied States',
       'Netherlands', 'germany', 'Europe', 'U S', 'u.s.', 'U.K. ',
       'Costa Rica', 'The United States of America', 'unite states',
       'U.S.', '46', 'Australia', 'Greece', 'USA? Hard to tell anymore..',
       "'merica", '45', 'United State', '32', 'France', 'australia',
       'Can', 'Canae', 'Trumpistan', 'Ireland', 'United Sates', 'Korea',
       'California', 'Unites States', 'Japan', 'USa', 'South africa',
       'I pretend to be from Canada, but I am really from the United States.',
       'Usa ', 'Uk', 'Germany', 'Canada`', 'Scotland', 'UK ', 'Denmark',
       'United Stated', 'France ', 'Switzerland', 'UD', 'Scotland ',
       'South Korea', 'New Jersey', 'CANADA', 'Indonesia',
       'United ststes', 'America', 'The Netherlands', 'United Statss',
       'endland', 'murrika', 'USA! USA! USA!', 'USAA', 'united States ',
       'soviet canuckistan', 'N. America', 'Singapore', 'USSA', 'China',
       'Taiwan', 'Ireland ', 'hong kong', 'spain', 'Sweden', 'Hong Kong',
       'U.S. ', 'Narnia', 'u s a', 'United Statea', 'united ststes',
       'subscribe to dm4uz3 on youtube', 'United kingdom',
       'USA USA USA!!!!', "I don't know anymore", 'Fear and Loathing'],
      dtype=object)


# check the Q4: COUNTRY number of unique values

candy['Q4: COUNTRY'].nunique()

115


#View the Q4: Country column
candy['Q4: COUNTRY']

1       USA 
2        USA
3         us
4        usa
5        NaN
        ... 
2474     USA
2475     USA
2476    USA 
2477     Usa
2478     usa
Name: Q4: COUNTRY, Length: 2305, dtype: object


#Check for value counts
candy['Q4: COUNTRY'].value_counts()

USA                            669
United States                  475
usa                            210
Canada                         168
Usa                            131
                              ... 
United State                     1
45                               1
'merica                          1
USA? Hard to tell anymore..      1
Fear and Loathing                1
Name: Q4: COUNTRY, Length: 115, dtype: int64


#Fill the missing values in the Q4: Country column with 'Other'

candy['Q4: COUNTRY'] = candy['Q4: COUNTRY'].fillna('Other')
candy['Q4: COUNTRY']

1        USA 
2         USA
3          us
4         usa
5       Other
        ...  
2474      USA
2475      USA
2476     USA 
2477      Usa
2478      usa
Name: Q4: COUNTRY, Length: 2305, dtype: object


# check missing Q4 values
candy['Q4: COUNTRY'].isnull().sum()

0


# check unique values 

candy['Q4: COUNTRY'].nunique()

116


#Combine all Australia entries into Other
candy['Q4: COUNTRY'].replace({'Australia':'Other',
                              'australia':'Other'
                             }, inplace=True)


# check unique values
candy['Q4: COUNTRY'].nunique()

114


candy['Q4: COUNTRY'] == 'O'

1       False
2       False
3       False
4       False
5       False
        ...  
2474    False
2475    False
2476    False
2477    False
2478    False
Name: Q4: COUNTRY, Length: 2305, dtype: bool


#Combine all United States entries together into USA
candy['Q4: COUNTRY'].replace({ 
    'USA ':'USA', 'USA':'USA', 'us':'USA', 'usa':'USA', 'Us':'USA', 'US':'USA', 
    'Murica':'USA', 'United States':'USA', 'united states':'USA', 
    'Usa':'USA', 'United States ':'USA', 'United staes':'USA', 
    'United States of America':'USA', 'United states':'USA', 
    'u.s.a.':'USA', 'United States of America ':'USA', 
    'america':'USA', 'U.S.A.':'USA', 'unhinged states':'USA', 
    'united states of america':'USA', 'US of A':'USA', 
    'The United States':'USA', 'North Carolina ':'USA', 
    'Unied States':'USA', 'U S':'USA', 'u.s.':'USA', 
    'The United States of America':'USA', 
    'unite states':'USA','U.S.':'USA', 'USA? Hard to tell anymore..':'USA', 
    "'merica":'USA', 'United State':'USA', 'United Sates':'USA', 
    'California':'USA', 'Unites States':'USA', 'USa':'USA', 
    'I pretend to be from Canada, but I am really from the United States.':'USA', 
    'Usa ':'USA', 'United Stated':'USA', 'New Jersey':'USA', 'United ststes':'USA', 
    'America':'USA', 'United Statss':'USA', 'murrika':'USA', 
    'USA! USA! USA!':'USA', 'USAA':'USA', 'united States ':'USA', 
    'N. America':'USA', 'USSA':'USA', 'U.S. ':'USA', 'u s a':'USA', 
    'United Statea':'USA', 'united ststes':'USA', 
    'USA USA USA!!!!':'USA'
}, inplace=True)


# check unique values
candy['Q4: COUNTRY'].nunique()

61


#Combine the Canadian entries (both upper and lower case) 
# And label them as CA
candy['Q4: COUNTRY'].replace({
    'canada':'CA', 'Canada':'CA', 'canada ':'CA', 
    'Canada ':'CA', 'Can':'CA', 'Canae':'CA', 
    'Canada`':'CA', 'CANADA':'CA'}, inplace=True)


# check unique values
candy['Q4: COUNTRY'].nunique()

54


candy['Q4: COUNTRY'].replace({
    'uk':'EU','United Kingdom':'EU', 
    'England':'EU', 'UK':'EU', 'france':'EU', 
    'finland':'EU', 'Netherlands':'EU', 
    'germany':'EU', 'Europe':'EU', 'U.K. ':'EU', 
    'Greece':'EU', 'France':'EU', 'Ireland':'EU', 
    'Uk':'EU', 'Germany':'EU', 'Scotland':'EU', 
    'UK ':'EU', 'Denmark':'EU', 'France ':'EU', 
    'Switzerland':'EU', 'Scotland ':'EU', 
    'The Netherlands':'EU', 'Ireland ':'EU', 
    'spain':'EU', 'Sweden':'EU', 
    'United kingdom':'EU'}, inplace=True)


# check unique values
candy['Q4: COUNTRY'].nunique()

29


#Combine the other entries and label them as 'Other'

candy['Q4: COUNTRY'].replace({
    'Murica':'Other','nan':'Other','UAE':'Other','35':'Other',
    'unhinged states':'Other','Mexico':'Other','Costa Rica':'Other','46':'Other',
    'Other':'Other','45':'Other','32':'Other','Trumpistan':'Other','Korea':'Other',
    'Japan':'Other','South africa':'Other','UD':'Other', 'South Korea':'Other',
    'Indonesia':'Other','endland':'Other', 'murrika':'Other', 'soviet canuckistan':'Other', 
    'Singapore':'Other', 'China':'Other','Taiwan':'Other', 'hong kong':'Other', 'spain':'Other', 
    'Sweden':'Other','Hong Kong':'Other','Narnia':'Other', 'subscribe to dm4uz3 on youtube':'Other', 
    "I don't know anymore":'Other','Fear and Loathing':'Other'
}, inplace=True)


# check unique values
candy['Q4: COUNTRY'].nunique()

4


## Alternative option:

# All other values will be grouped as 'other'
# So we can use the following code
# ~ means, negating(take the opposite)

# candy['Q4: COUNTRY'] != 'USA', 'CA', 'EU' 

candy.loc[~candy['Q4: COUNTRY'].isin(['USA', 'CA','EU' ]), 'Q4: COUNTRY'] = 'Other'


# Save the value counts of the Q4: COUNTRY column as Q6

Q6 = candy['Q4: COUNTRY'].value_counts()    
Q6

USA      1955
CA        216
EU         73
Other      61
Name: Q4: COUNTRY, dtype: int64


candy['Q4: COUNTRY'].isnull().sum()

0


# check unique age values for the age column
candy['Q3: AGE'].nunique()

95


candy['Q3: AGE'].unique()

array(['44', '49', '40', '23', nan, '53', '33', '43', '56', '64', '37',
       '48', '54', '36', '45', '25', '34', '35', '38', '58', '50', '47',
       '16', '52', '63', '65', '41', '27', '31', '59', '61', '46', '42',
       '62', '29', '39', '32', '28', '69', '67', '30', '22', '51', '70',
       '24', '19', 'Old enough', '57', '60', '66', '12', 'Many', '55',
       '72', '?', '21', '11', 'no', '9', '68', '20', '6', '10', '71',
       '13', '26', '45-55', '7', '39.4', '74', '18', 'older than dirt',
       '17', '15', '8', '75', '5u', 'Enough', 'Over 50', '90', '76',
       'sixty-nine', 'ancient', '77', 'OLD', 'old', '73', '70 1/2', '14',
       'MY NAME JEFF', '4', '59 on the day after Halloween', 'old enough',
       'your mom', 'I can remember when Java was a cool new language',
       '60+'], dtype=object)


# create True/False index
age_index = candy['Q3: AGE'].str.isnumeric()

# for the index, fill missing values with False
age_index = age_index.fillna(False)


# select Age column for only those False values from index and code as missing
candy.loc[~age_index, 'Q3: AGE'] = np.nan


#Viewing missing values with False indeces

candy['Q3: AGE'].isnull()

1       False
2       False
3       False
4       False
5        True
        ...  
2474    False
2475    False
2476    False
2477    False
2478    False
Name: Q3: AGE, Length: 2305, dtype: bool


#Number of missing values in 'Q3: AGE' column
candy['Q3: AGE'].isnull().sum()

60


#Convert to float dtype

candy['Q3: AGE'] = candy['Q3: AGE'].astype(float)
candy['Q3: AGE']

1       44.0
2       49.0
3       40.0
4       23.0
5        NaN
        ... 
2474    24.0
2475    33.0
2476    26.0
2477    58.0
2478    66.0
Name: Q3: AGE, Length: 2305, dtype: float64


##Your task is to put these values into the following categorical bins: 
# unknown, 17 and under, 18-25, 26-35, 36-45, 46-55, and 56+
    
## We firt Create the following bins for the 'Q3: AGE' column:
# 17 and under, 18-25, 26-35, 36-45, 46-55, and 56+
    
candy.loc[:,'Q3: AGE'] = pd.cut(candy['Q3: AGE'], bins=[0, 17, 25, 35, 45, 55, 100],
                          labels=['17 and under','18-25', '26-35', '36-45', '46-55', '56+'])


#Check code
candy['Q3: AGE']

1       36-45
2       46-55
3       36-45
4       18-25
5         NaN
        ...  
2474    18-25
2475    26-35
2476    26-35
2477      56+
2478      56+
Name: Q3: AGE, Length: 2305, dtype: category
Categories (6, object): ['17 and under' < '18-25' < '26-35' < '36-45' < '46-55' < '56+']


## add 'unknown' category in the above df(labels)

candy['Q3: AGE'] = candy['Q3: AGE'].cat.add_categories('unknown').cat.reorder_categories(['unknown', '17 and under','18-25', '26-35', '36-45', '46-55', '56+'])

# fill the NaN missing values with 'unknown'

candy.loc[:,'Q3: AGE'] = candy.loc[:,'Q3: AGE'].astype('category').fillna('unknown')


candy['Q3: AGE']

1         36-45
2         46-55
3         36-45
4         18-25
5       unknown
         ...   
2474      18-25
2475      26-35
2476      26-35
2477        56+
2478        56+
Name: Q3: AGE, Length: 2305, dtype: category
Categories (7, object): ['unknown' < '17 and under' < '18-25' < '26-35' < '36-45' < '46-55' < '56+']


#checking the categories for the Q3: AGE column

bins = [0, 18, 25, 35, 45, 55, 100]
labels =['unknown', '17 and under', '18-25', '26-35', '36-45', '46-55', '56+']
      
index = pd.DataFrame(candy['Q3: AGE'], labels, bins)
index.index.astype(object)

Index(['unknown', '17 and under', '18-25', '26-35', '36-45', '46-55', '56+'], dtype='object')


#Sort values for candy['Q3: AGE'] column, as above

candy['Q3: AGE'].value_counts().sort_index(ascending = True)

unknown          60
17 and under     49
18-25            85
26-35           520
36-45           768
46-55           525
56+             298
Name: Q3: AGE, dtype: int64


#Checking missing value

candy['Q3: AGE'].isnull().sum()

0


#Check to see if there are any missing values in the data set. 
# The output should show 0

candy.isnull().sum()

Q1: GOING OUT?             0
Q2: GENDER                 0
Q3: AGE                    0
Q4: COUNTRY                0
Q6 | 100 Grand Bar         0
                          ..
Q11: DAY                   0
Q12: MEDIA [Daily Dish]    0
Q12: MEDIA [Science]       0
Q12: MEDIA [ESPN]          0
Q12: MEDIA [Yahoo]         0
Length: 113, dtype: int64


### Reset index for candy ensuring that it goes from 0 to n-1

# We apply the (start, stop, step) method 

candy.index = pd.RangeIndex(start=0, stop=len(candy1), step=1)
candy


### Option 2:

#Reset index and retain old index as a column
# candy = candy.reset_index(drop=True)

# candy


##There were 2479 rows in the original uncleaned data that we imported
# 2479 rows × 120 columns

candy_full.shape

(2479, 120)


##we end up with 2305 rows, removing/cleaning from the old data set
#2305 rows × 113 columns

candy.shape

(2305, 113)


# How many did we end up removing from the data set?  
# Rows = 2479 rows  - 2305 rows = 174 rows

pd.concat([candy_full, candy]).loc[candy_full.index.symmetric_difference(candy.index)]


# number of respondents planning to go out

candy['Q1: GOING OUT?'].value_counts()   #-- 298 number of respondents

No     2007
Yes     298
Name: Q1: GOING OUT?, dtype: int64


# percentages of respondents planning to go out


print(candy['Q1: GOING OUT?'].value_counts(normalize=True).sort_index(ascending = False))

print('======================================================')

print('About 12% or 13% respondents are planning to go out')

Yes    0.129284
No     0.870716
Name: Q1: GOING OUT?, dtype: float64
======================================================
About 12% or 13% respondents are planning to go out


#percentage of respondents 17 and younger 

# Answer: 0.2 (2%)

print(candy['Q3: AGE'].value_counts(normalize=True).sort_index(ascending = True))

print('==========================================================')

print('About 2% respondents 17 and younger are planning to go out')

unknown         0.026030
17 and under    0.021258
18-25           0.036876
26-35           0.225597
36-45           0.333189
46-55           0.227766
56+             0.129284
Name: Q3: AGE, dtype: float64
==========================================================
About 2% respondents 17 and younger are planning to go out


#Total value counts and the normalized percentages of the age categories

print(candy['Q3: AGE'].value_counts().sum())

print('============================')

print(candy['Q3: AGE'].value_counts(normalize=True).sort_index(ascending = True))

2305
============================
unknown         0.026030
17 and under    0.021258
18-25           0.036876
26-35           0.225597
36-45           0.333189
46-55           0.227766
56+             0.129284
Name: Q3: AGE, dtype: float64


# We can print this for a list of column names

for col in candy.columns:
    print(col)

Q1: GOING OUT?
Q2: GENDER
Q3: AGE
Q4: COUNTRY
Q6 | 100 Grand Bar
Q6 | Anonymous brown globs that come in black and orange wrappers	(a.k.a. Mary Janes)
Q6 | Any full-sized candy bar
Q6 | Black Jacks
Q6 | Bonkers (the candy)
Q6 | Bonkers (the board game)
Q6 | Bottle Caps
Q6 | Box'o'Raisins
Q6 | Broken glow stick
Q6 | Butterfinger
Q6 | Cadbury Creme Eggs
Q6 | Candy Corn
Q6 | Candy that is clearly just the stuff given out for free at restaurants
Q6 | Caramellos
Q6 | Cash, or other forms of legal tender
Q6 | Chardonnay
Q6 | Chick-o-Sticks (we donÕt know what that is)
Q6 | Chiclets
Q6 | Coffee Crisp
Q6 | Creepy Religious comics/Chick Tracts
Q6 | Dental paraphenalia
Q6 | Dots
Q6 | Dove Bars
Q6 | Fuzzy Peaches
Q6 | Generic Brand Acetaminophen
Q6 | Glow sticks
Q6 | Goo Goo Clusters
Q6 | Good N' Plenty
Q6 | Gum from baseball cards
Q6 | Gummy Bears straight up
Q6 | Hard Candy
Q6 | Healthy Fruit
Q6 | Heath Bar
Q6 | Hershey's Dark Chocolate
Q6_new | Hershey's Milk Chocolate
Q6 | Hershey's Kisses
Q6 | Hugs (actual physical hugs)
Q6 | Jolly Rancher (bad flavor)
Q6 | Jolly Ranchers (good flavor)
Q6 | JoyJoy (Mit Iodine!)
Q6 | Junior Mints
Q6 | Senior Mints
Q6 | Kale smoothie
Q6 | Kinder Happy Hippo
Q6 | Kit Kat
Q6 | LaffyTaffy
Q6 | LemonHeads
Q6 | Licorice (not black)
Q6 | Licorice (yes black)
Q6 | Lindt Truffle
Q6 | Lollipops
Q6 | Mars
Q6 | Maynards
Q6 | Mike and Ike
Q6 | Milk Duds
Q6 | Milky Way
Q6 | Regular M&Ms
Q6 | Peanut M&MÕs
Q6 | Blue M&M's
Q6 | Red M&M's
Q6 | Green Party M&M's
Q6 | Independent M&M's
Q6 | Abstained from M&M'ing.
Q6 | Minibags of chips
Q6 | Mint Kisses
Q6 | Mint Juleps
Q6 | Mr. Goodbar
Q6 | Necco Wafers
Q6 | Nerds
Q6 | Nestle Crunch
Q6 | Now'n'Laters
Q6 | Peeps
Q6 | Pencils
Q6 | Pixy Stix
Q6 | Real Housewives of Orange County Season 9 Blue-Ray
Q6 | ReeseÕs Peanut Butter Cups
Q6 | Reese's Pieces
Q6 | Reggie Jackson Bar
Q6 | Rolos
Q6 | Sandwich-sized bags filled with BooBerry Crunch
Q6 | Skittles
Q6 | Smarties (American)
Q6 | Smarties (Commonwealth)
Q6 | Snickers
Q6 | Sourpatch Kids (i.e. abominations of nature)
Q6 | Spotted Dick
Q6 | Starburst
Q6 | Sweet Tarts
Q6 | Swedish Fish
Q6 | Sweetums (a friend to diabetes)
Q6 | Take 5
Q6 | Tic Tacs
Q6 | Those odd marshmallow circus peanut things
Q6 | Three Musketeers
Q6 | Tolberone something or other
Q6 | Trail Mix
Q6 | Twix
Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein
Q6 | Vicodin
Q6 | Whatchamacallit Bars
Q6 | White Bread
Q6 | Whole Wheat anything
Q6 | York Peppermint Patties
Q10: DRESS
Q11: DAY
Q12: MEDIA [Daily Dish]
Q12: MEDIA [Science]
Q12: MEDIA [ESPN]
Q12: MEDIA [Yahoo]


#Total counts for the gender column
##Total percentages for the gender column

print(candy['Q2: GENDER'].value_counts().sum())

print('========================')

print(candy['Q2: GENDER'].value_counts(normalize=True).sort_index(ascending = True))

2305
========================
Female    0.363991
Male      0.636009
Name: Q2: GENDER, dtype: float64


# Breakdown of counts for the country column

candy['Q4: COUNTRY'].value_counts()

USA      1955
CA        216
EU         73
Other      61
Name: Q4: COUNTRY, dtype: int64


#Total vote respondents for JOY and DESPAIR in candy corn 
#Total percentages of vote respondents

print(candy['Q6 | Candy Corn'].value_counts())

print('=================================')

print(candy['Q6 | Candy Corn'].value_counts(normalize=True).round(2))

DESPAIR      702
NO_ANSWER    620
MEH          529
JOY          454
Name: Q6 | Candy Corn, dtype: int64
=================================
DESPAIR      0.30
NO_ANSWER    0.27
MEH          0.23
JOY          0.20
Name: Q6 | Candy Corn, dtype: float64


#Total percentages of vote respondents for JOY and DESPAIR in candy corn 

print(candy['Q6 | Candy Corn'].value_counts(normalize=True))

print('=================================================')

print('More people voted for DESPAIR in candy corn = 30% ')

DESPAIR      0.304555
NO_ANSWER    0.268980
MEH          0.229501
JOY          0.196963
Name: Q6 | Candy Corn, dtype: float64
=================================================
More people voted for DESPAIR in candy corn = 30%


# candy['Q6 | Reese\'s Peanut Butter Cups'].value_counts()    #check Reese\'s Peanut Butter Cups spelling


candy['Q6 | Snickers'].value_counts()

JOY          1325
NO_ANSWER     633
MEH           273
DESPAIR        74
Name: Q6 | Snickers, dtype: int64


candy['Q6 | Twix'].value_counts()

JOY          1339
NO_ANSWER     630
MEH           269
DESPAIR        67
Name: Q6 | Twix, dtype: int64


candy['Q6 | Kit Kat'].value_counts()

JOY          1367
NO_ANSWER     616
MEH           275
DESPAIR        47
Name: Q6 | Kit Kat, dtype: int64


print(candy['Q6 | White Bread'].value_counts())

DESPAIR      1376
NO_ANSWER     694
MEH           192
JOY            43
Name: Q6 | White Bread, dtype: int64


print(candy['Q6 | Whole Wheat anything'].value_counts())  

print('===============================================')

print('More people voted joy on whole wheat items')

DESPAIR      1233
NO_ANSWER     684
MEH           278
JOY           110
Name: Q6 | Whole Wheat anything, dtype: int64
===============================================
More people voted joy on whole wheat items


candy['Q6 | Bonkers (the board game)'].value_counts()

NO_ANSWER    926
MEH          673
DESPAIR      518
JOY          188
Name: Q6 | Bonkers (the board game), dtype: int64


print(candy['Q6 | Bonkers (the candy)'].value_counts()) 

print('===============================================')

print('More people voted JOY for the board game than the candy = 188')

NO_ANSWER    911
MEH          818
DESPAIR      467
JOY          109
Name: Q6 | Bonkers (the candy), dtype: int64
===============================================
More people voted JOY for the board game than the candy = 188


# candy['Box Of Raisins'].value_counts()


candy['Q6 | Candy Corn'].value_counts()

DESPAIR      702
NO_ANSWER    620
MEH          529
JOY          454
Name: Q6 | Candy Corn, dtype: int64


# favorite day of the week by total counts

print(candy['Q11: DAY'].value_counts())

Friday       1026
NO_ANSWER     658
Sunday        621
Name: Q11: DAY, dtype: int64


# favorite day of the week by percentage

print(candy['Q11: DAY'].value_counts(normalize=True).round(2))

print('===============================================')

print('Friday is the favorite day of the week for the respondents, both by total counts = 1026, and percentages = 45% )')

Friday       0.45
NO_ANSWER    0.29
Sunday       0.27
Name: Q11: DAY, dtype: float64
===============================================
Friday is the favorite day of the week for the respondents, both by total counts = 1026, and percentages = 45% )


# color of the dress (total counts)

print(candy['Q10: DRESS'].value_counts())

White and gold    1027
NO_ANSWER          679
Blue and black     599
Name: Q10: DRESS, dtype: int64


# color of the dress (percentages)

print(candy['Q10: DRESS'].value_counts(normalize=True).round(2))

print('===========================================================================')  

print('More respondents see White and gold for the color of the dress (1027 = 45%)')

White and gold    0.45
NO_ANSWER         0.29
Blue and black    0.26
Name: Q10: DRESS, dtype: float64
===========================================================================
More respondents see White and gold for the color of the dress (1027 = 45%)


print(candy['Q12: MEDIA [Yahoo]'].value_counts())

0.0    2244
1.0      61
Name: Q12: MEDIA [Yahoo], dtype: int64


# number of Males and Females voted joy

print(candy[['Q2: GENDER', 'Q6 | Real Housewives of Orange County Season 9 Blue-Ray']].value_counts())

print('====================================')  

print('45 males and 41 Females voted joy')

Q2: GENDER  Q6 | Real Housewives of Orange County Season 9 Blue-Ray
Male        DESPAIR                                                    901
Female      DESPAIR                                                    497
Male        NO_ANSWER                                                  439
Female      NO_ANSWER                                                  253
Male        MEH                                                         81
Female      MEH                                                         48
Male        JOY                                                         45
Female      JOY                                                         41
dtype: int64
====================================
45 males and 41 Females voted joy


# number of respondents 17 or younger who voted joy

print(candy[['Q3: AGE','Q6 | Real Housewives of Orange County Season 9 Blue-Ray']].value_counts())

print('====================================')  

print('2 respondents 17 or younger voted joy')

Q3: AGE       Q6 | Real Housewives of Orange County Season 9 Blue-Ray
36-45         DESPAIR                                                    482
46-55         DESPAIR                                                    377
26-35         DESPAIR                                                    273
36-45         NO_ANSWER                                                  224
26-35         NO_ANSWER                                                  187
56+           DESPAIR                                                    177
46-55         NO_ANSWER                                                  109
56+           NO_ANSWER                                                   91
36-45         MEH                                                         38
18-25         NO_ANSWER                                                   35
              DESPAIR                                                     35
26-35         MEH                                                         33
unknown       NO_ANSWER                                                   29
26-35         JOY                                                         27
17 and under  DESPAIR                                                     27
unknown       DESPAIR                                                     27
46-55         MEH                                                         26
36-45         JOY                                                         24
56+           MEH                                                         18
17 and under  NO_ANSWER                                                   17
46-55         JOY                                                         13
56+           JOY                                                         12
18-25         MEH                                                         10
              JOY                                                          5
unknown       JOY                                                          3
17 and under  MEH                                                          3
              JOY                                                          2
unknown       MEH                                                          1
dtype: int64
====================================
2 respondents 17 or younger voted joy


## number of total respondents voted joy in the following: 
   #Blue M&M's, Red M&M's, Green Party M&M's, Independent M&M's, and Abstained from M&M'ing

print(pd.DataFrame(candy, columns = ['Q6 | Blue M&M\'s']).value_counts())
print('==============================================') 
print('963 total respondents voted joy in Blue M&M\'s')
print('==============================================')
print(pd.DataFrame(candy, columns = ['Q6 | Red M&M\'s']).value_counts())
print('==============================================') 
print('949 total respondents voted joy in Red M&M\'s')
print('==============================================') 
print(pd.DataFrame(candy, columns = ['Q6 | Green Party M&M\'s']).value_counts())
print('=====================================================') 
print('915 total respondents voted joy in Green Party M&M\'s')
print('=====================================================') 
print(pd.DataFrame(candy, columns = ['Q6 | Independent M&M\'s']).value_counts())
print('======================================================') 
print('711 total respondents voted joy in Independent M&M\'s')
print('======================================================') 
print(pd.DataFrame(candy, columns = ['Q6 | Abstained from M&M\'ing.']).value_counts())
print('==========================================================') 
print('202 total respondents voted joy in Abstained from M&M\'ing.')

Q6 | Blue M&M's
JOY                963
NO_ANSWER          667
MEH                563
DESPAIR            112
dtype: int64
==============================================
963 total respondents voted joy in Blue M&M's
==============================================
Q6 | Red M&M's
JOY               949
NO_ANSWER         669
MEH               574
DESPAIR           113
dtype: int64
==============================================
949 total respondents voted joy in Red M&M's
==============================================
Q6 | Green Party M&M's
JOY                       915
NO_ANSWER                 701
MEH                       561
DESPAIR                   128
dtype: int64
=====================================================
915 total respondents voted joy in Green Party M&M's
=====================================================
Q6 | Independent M&M's
NO_ANSWER                 749
JOY                       711
MEH                       592
DESPAIR                   253
dtype: int64
======================================================
711 total respondents voted joy in Independent M&M's
======================================================
Q6 | Abstained from M&M'ing.
NO_ANSWER                       868
DESPAIR                         662
MEH                             573
JOY                             202
dtype: int64
==========================================================
202 total respondents voted joy in Abstained from M&M'ing.


# We can print this for a list of column names

for col in candy.columns:
    print(col)

Q1: GOING OUT?
Q2: GENDER
Q3: AGE
Q4: COUNTRY
Q6 | 100 Grand Bar
Q6 | Anonymous brown globs that come in black and orange wrappers	(a.k.a. Mary Janes)
Q6 | Any full-sized candy bar
Q6 | Black Jacks
Q6 | Bonkers (the candy)
Q6 | Bonkers (the board game)
Q6 | Bottle Caps
Q6 | Box'o'Raisins
Q6 | Broken glow stick
Q6 | Butterfinger
Q6 | Cadbury Creme Eggs
Q6 | Candy Corn
Q6 | Candy that is clearly just the stuff given out for free at restaurants
Q6 | Caramellos
Q6 | Cash, or other forms of legal tender
Q6 | Chardonnay
Q6 | Chick-o-Sticks (we donÕt know what that is)
Q6 | Chiclets
Q6 | Coffee Crisp
Q6 | Creepy Religious comics/Chick Tracts
Q6 | Dental paraphenalia
Q6 | Dots
Q6 | Dove Bars
Q6 | Fuzzy Peaches
Q6 | Generic Brand Acetaminophen
Q6 | Glow sticks
Q6 | Goo Goo Clusters
Q6 | Good N' Plenty
Q6 | Gum from baseball cards
Q6 | Gummy Bears straight up
Q6 | Hard Candy
Q6 | Healthy Fruit
Q6 | Heath Bar
Q6 | Hershey's Dark Chocolate
Q6_new | Hershey's Milk Chocolate
Q6 | Hershey's Kisses
Q6 | Hugs (actual physical hugs)
Q6 | Jolly Rancher (bad flavor)
Q6 | Jolly Ranchers (good flavor)
Q6 | JoyJoy (Mit Iodine!)
Q6 | Junior Mints
Q6 | Senior Mints
Q6 | Kale smoothie
Q6 | Kinder Happy Hippo
Q6 | Kit Kat
Q6 | LaffyTaffy
Q6 | LemonHeads
Q6 | Licorice (not black)
Q6 | Licorice (yes black)
Q6 | Lindt Truffle
Q6 | Lollipops
Q6 | Mars
Q6 | Maynards
Q6 | Mike and Ike
Q6 | Milk Duds
Q6 | Milky Way
Q6 | Regular M&Ms
Q6 | Peanut M&MÕs
Q6 | Blue M&M's
Q6 | Red M&M's
Q6 | Green Party M&M's
Q6 | Independent M&M's
Q6 | Abstained from M&M'ing.
Q6 | Minibags of chips
Q6 | Mint Kisses
Q6 | Mint Juleps
Q6 | Mr. Goodbar
Q6 | Necco Wafers
Q6 | Nerds
Q6 | Nestle Crunch
Q6 | Now'n'Laters
Q6 | Peeps
Q6 | Pencils
Q6 | Pixy Stix
Q6 | Real Housewives of Orange County Season 9 Blue-Ray
Q6 | ReeseÕs Peanut Butter Cups
Q6 | Reese's Pieces
Q6 | Reggie Jackson Bar
Q6 | Rolos
Q6 | Sandwich-sized bags filled with BooBerry Crunch
Q6 | Skittles
Q6 | Smarties (American)
Q6 | Smarties (Commonwealth)
Q6 | Snickers
Q6 | Sourpatch Kids (i.e. abominations of nature)
Q6 | Spotted Dick
Q6 | Starburst
Q6 | Sweet Tarts
Q6 | Swedish Fish
Q6 | Sweetums (a friend to diabetes)
Q6 | Take 5
Q6 | Tic Tacs
Q6 | Those odd marshmallow circus peanut things
Q6 | Three Musketeers
Q6 | Tolberone something or other
Q6 | Trail Mix
Q6 | Twix
Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein
Q6 | Vicodin
Q6 | Whatchamacallit Bars
Q6 | White Bread
Q6 | Whole Wheat anything
Q6 | York Peppermint Patties
Q10: DRESS
Q11: DAY
Q12: MEDIA [Daily Dish]
Q12: MEDIA [Science]
Q12: MEDIA [ESPN]
Q12: MEDIA [Yahoo]


#Select Q6 candy columns (Q6 | 100 Grand Bar through Q6 | York Peppermint Patties) 
# And save this as a new DataFrame called candy_reduced

candy_reduced = candy.loc[:, 'Q6 | 100 Grand Bar':'Q6 | York Peppermint Patties']
candy_reduced


candy_reduced[candy_reduced == 'JOY'].count().sort_values(ascending=False)

Q6 | Any full-sized candy bar                                                  1477
Q6 | ReeseÕs Peanut Butter Cups                                                1416
Q6 | Kit Kat                                                                   1367
Q6 | Cash, or other forms of legal tender                                      1363
Q6 | Twix                                                                      1339
                                                                               ... 
Q6 | JoyJoy (Mit Iodine!)                                                        72
Q6 | Gum from baseball cards                                                     43
Q6 | White Bread                                                                 43
Q6 | Candy that is clearly just the stuff given out for free at restaurants      37
Q6 | Broken glow stick                                                           24
Length: 103, dtype: int64


candy_reduced[candy_reduced == 'DESPAIR'].count().sort_values(ascending=False)

Q6 | Broken glow stick                                     1535
Q6 | Real Housewives of Orange County Season 9 Blue-Ray    1398
Q6 | Gum from baseball cards                               1386
Q6 | White Bread                                           1376
Q6 | Kale smoothie                                         1365
                                                           ... 
Q6 | Regular M&Ms                                            71
Q6 | Twix                                                    67
Q6 | Cash, or other forms of legal tender                    63
Q6 | Kit Kat                                                 47
Q6 | Any full-sized candy bar                                15
Length: 103, dtype: int64


candy_reduced[candy_reduced == 'MEH'].count().sort_values(ascending=False)

Q6 | Lollipops                                             877
Q6 | Hard Candy                                            855
Q6 | Bonkers (the candy)                                   818
Q6 | Minibags of chips                                     718
Q6 | 100 Grand Bar                                         715
                                                          ... 
Q6 | ReeseÕs Peanut Butter Cups                            178
Q6 | Kale smoothie                                         162
Q6 | Real Housewives of Orange County Season 9 Blue-Ray    129
Q6 | Broken glow stick                                      99
Q6 | Creepy Religious comics/Chick Tracts                   95
Length: 103, dtype: int64


candy_reduced[candy_reduced == 'NO_ANSWER'].count().sort_values(ascending=False)

Q6 | JoyJoy (Mit Iodine!)               942
Q6 | Maynards                           939
Q6 | Reggie Jackson Bar                 933
Q6 | Bonkers (the board game)           926
Q6 | Sweetums (a friend to diabetes)    924
                                       ... 
Q6 | Kit Kat                            616
Q6 | Any full-sized candy bar           615
Q6 | Hershey's Dark Chocolate           615
Q6_new | Hershey's Milk Chocolate       614
Q6 | Peanut M&MÕs                       614
Length: 103, dtype: int64


joy_count = pd.Series(candy_reduced[candy_reduced == 'JOY'].count())
joy_count

# candy_reduced[candy_reduced == 'JOY'].count()

Q6 | 100 Grand Bar                                                                         828
Q6 | Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)     165
Q6 | Any full-sized candy bar                                                             1477
Q6 | Black Jacks                                                                            87
Q6 | Bonkers (the candy)                                                                   109
                                                                                          ... 
Q6 | Vicodin                                                                               674
Q6 | Whatchamacallit Bars                                                                  802
Q6 | White Bread                                                                            43
Q6 | Whole Wheat anything                                                                  110
Q6 | York Peppermint Patties                                                              1040
Length: 103, dtype: int64


despair_count = pd.Series(candy_reduced[candy_reduced == 'DESPAIR'].count())
despair_count

Q6 | 100 Grand Bar                                                                          80
Q6 | Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)    1017
Q6 | Any full-sized candy bar                                                               15
Q6 | Black Jacks                                                                           754
Q6 | Bonkers (the candy)                                                                   467
                                                                                          ... 
Q6 | Vicodin                                                                               679
Q6 | Whatchamacallit Bars                                                                  273
Q6 | White Bread                                                                          1376
Q6 | Whole Wheat anything                                                                 1233
Q6 | York Peppermint Patties                                                               222
Length: 103, dtype: int64


candy_reduced_transpose = candy_reduced.T
candy_reduced_transpose = candy_reduced_transpose.reset_index()
candy_reduced_transpose


## Add a new column called "joy_count" using the `joy_count` Series above to the candy_reduced_transpose DataFrame
candy_reduced_transpose['joy_count'] = candy_reduced_transpose['index'].map(joy_count)
candy_reduced_transpose


#check joy_count sum 
joy_count.sum()

60771


## Add a new column called 'despair_count" using the despair_count Series above to the candy_reduced_transpose DataFrame.

candy_reduced_transpose['despair_count'] = candy_reduced_transpose['index'].map(despair_count)
candy_reduced_transpose


#check despair_count sum
despair_count.sum()

55169


# Add a new column to the candy_reduced_transpose DataFrame called "net_feelies" 
# that takes the joy_count column and subtracts the despair_count column

candy_reduced_transpose['net_feelies'] = candy_reduced_transpose['joy_count']-candy_reduced_transpose['despair_count']                           ### ENTER CODE HERE ###
candy_reduced_transpose


## net_feelies sum check

# np.random.seed(56)

# count = np.random.uniform(low = -57, high = 9, size = (len(candy_reduced_transpose))).round(0)

# net_feelies = count
# net_feelies.sum()


### This needs to be corrected! ###

candy_net_sorted = candy_reduced_transpose[['joy_count', 'despair_count', 'net_feelies']]#.sort_values(by='net_feelies', ascending=False)                          
candy_net_sorted.sort_values(by='net_feelies', ascending=False)


###### net_feelies sum check ######
## This need correction. Answer should be 5602, not -2523 ##

candy_net_sorted['net_feelies'].sum()

5602


candy_encode = candy.copy()
candy_encode.shape

(2305, 113)


# For the candy_encode DataFrame, replace any Female values with 0 and any Male values with 1

candy_encode['Q2: GENDER'].replace(['Female','Male'],[0,1], inplace=True)

candy_encode


# Select only the Q2: GENDER column from candy_encode and save this as candy_response(This should be a Series)

candy_response = candy_encode['Q2: GENDER']

candy_response

0       1
1       1
2       1
3       1
4       1
       ..
2300    1
2301    0
2302    0
2303    1
2304    0
Name: Q2: GENDER, Length: 2305, dtype: int64


### Drop the following columns from the candy_encode DataFrame: 
    # Q1: GOING OUT?,Q2: GENDER, Q3: AGE,Q4: COUNTRY,Q10: DRESS,Q11: DAY, Q12: MEDIA [Daily Dish],
    # Q12: MEDIA [Science],Q12: MEDIA [ESPN],Q12: MEDIA [Yahoo].
    
# Save the remaining columns as candy_features
    
    
candy_features = candy_encode.drop(columns = ['Q1: GOING OUT?','Q2: GENDER', 'Q3: AGE', 'Q4: COUNTRY', 
                                              'Q10: DRESS', 'Q11: DAY', 'Q12: MEDIA [Daily Dish]', 
                                              'Q12: MEDIA [Science]', 'Q12: MEDIA [ESPN]', 
                                              'Q12: MEDIA [Yahoo]'], axis =1)

candy_features


# Use Panda's get_dummies() to encode the candy_features data, making sure to set drop_first=True. 
# Save this as candy_features_encoded

candy_features_encoded = pd.get_dummies((candy_features), drop_first=True) 

candy_features_encoded

Data Cleaning¶

Halloween Candy¶

Overview and End Goal¶

Initial Import & Exploration¶

Data Cleaning¶

Before proceeding to next, we assign a new DataFrame: `candy2` for remaining columns and then next we combine `candy1` and `cand 2` DataFrames to get new modified `candy` dataset¶

Now, let's look at the very messy `Q4: COUNTRY` column and see what we can do about it. First, run the code below to look at the different unique values in the data.¶

We now want to look at the `Q3: AGE` column. Let's look at all the unique values.¶

Data Analysis¶

Encoding DataFrame¶

Final Analysis¶

	Internal ID	Q1: GOING OUT?	Q2: GENDER	Q3: AGE	Q4: COUNTRY	Q5: STATE, PROVINCE, COUNTY, ETC	Q6 \| 100 Grand Bar	Q6 \| Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)	Q6 \| Any full-sized candy bar	Q6 \| Black Jacks	...	Q8: DESPAIR OTHER	Q9: OTHER COMMENTS	Q10: DRESS	Unnamed: 113	Q11: DAY	Q12: MEDIA [Daily Dish]	Q12: MEDIA [Science]	Q12: MEDIA [ESPN]	Q12: MEDIA [Yahoo]	Click Coordinates (x, y)
0	90258773	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	90272821	No	Male	44	USA	NM	MEH	DESPAIR	JOY	MEH	...	NaN	Bottom line is Twix is really the only candy w...	White and gold	NaN	Sunday	NaN	1.0	NaN	NaN	(84, 25)
2	90272829	NaN	Male	49	USA	Virginia	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	90272840	No	Male	40	us	or	MEH	DESPAIR	JOY	MEH	...	NaN	Raisins can go to hell	White and gold	NaN	Sunday	NaN	1.0	NaN	NaN	(75, 23)
4	90272841	No	Male	23	usa	exton pa	JOY	DESPAIR	JOY	DESPAIR	...	NaN	NaN	White and gold	NaN	Friday	NaN	1.0	NaN	NaN	(70, 10)
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2474	90314359	No	Male	24	USA	MD	JOY	DESPAIR	MEH	DESPAIR	...	Fruit Stripe Gum	NaN	White and gold	NaN	Friday	NaN	NaN	NaN	NaN	NaN
2475	90314580	No	Female	33	USA	New York	MEH	DESPAIR	JOY	NaN	...	Capers	NaN	Blue and black	NaN	Friday	NaN	1.0	NaN	NaN	(70, 26)
2476	90314634	No	Female	26	USA	Tennessee	MEH	DESPAIR	JOY	DESPAIR	...	NaN	NaN	Blue and black	NaN	Friday	NaN	1.0	NaN	NaN	(67, 35)
2477	90314658	No	Male	58	Usa	North Carolina	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2478	90314802	No	Female	66	usa	Pennsylvania	DESPAIR	DESPAIR	JOY	DESPAIR	...	NaN	You hit all my chocolate highlights, and broug...	White and gold	NaN	Sunday	1.0	NaN	NaN	NaN	(19, 26)

	Q1: GOING OUT?	Q2: GENDER	Q3: AGE	Q4: COUNTRY	Q6 \| 100 Grand Bar	Q6 \| Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)	Q6 \| Any full-sized candy bar	Q6 \| Black Jacks	Q6 \| Bonkers (the candy)	Q6 \| Bonkers (the board game)	...	Q6 \| Whatchamacallit Bars	Q6 \| White Bread	Q6 \| Whole Wheat anything	Q6 \| York Peppermint Patties	Q10: DRESS	Q11: DAY	Q12: MEDIA [Daily Dish]	Q12: MEDIA [Science]	Q12: MEDIA [ESPN]	Q12: MEDIA [Yahoo]
0	No	Male	36-45	USA	MEH	DESPAIR	JOY	MEH	DESPAIR	DESPAIR	...	DESPAIR	DESPAIR	DESPAIR	DESPAIR	White and gold	Sunday	0.0	1.0	0.0	0.0
1	No	Male	46-55	USA	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	...	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	0.0	0.0	0.0	0.0
2	No	Male	36-45	USA	MEH	DESPAIR	JOY	MEH	MEH	DESPAIR	...	JOY	DESPAIR	DESPAIR	DESPAIR	White and gold	Sunday	0.0	1.0	0.0	0.0
3	No	Male	18-25	USA	JOY	DESPAIR	JOY	DESPAIR	MEH	DESPAIR	...	JOY	DESPAIR	DESPAIR	JOY	White and gold	Friday	0.0	1.0	0.0	0.0
4	No	Male	unknown	Other	JOY	DESPAIR	JOY	NO_ANSWER	NO_ANSWER	NO_ANSWER	...	JOY	DESPAIR	DESPAIR	JOY	NO_ANSWER	NO_ANSWER	0.0	1.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2300	No	Male	18-25	USA	JOY	DESPAIR	MEH	DESPAIR	DESPAIR	MEH	...	DESPAIR	MEH	DESPAIR	MEH	White and gold	Friday	0.0	0.0	0.0	0.0
2301	No	Female	26-35	USA	MEH	DESPAIR	JOY	NO_ANSWER	NO_ANSWER	NO_ANSWER	...	JOY	DESPAIR	MEH	JOY	Blue and black	Friday	0.0	1.0	0.0	0.0
2302	No	Female	26-35	USA	MEH	DESPAIR	JOY	DESPAIR	MEH	JOY	...	MEH	DESPAIR	DESPAIR	MEH	Blue and black	Friday	0.0	1.0	0.0	0.0
2303	No	Male	56+	USA	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	...	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	NO_ANSWER	0.0	0.0	0.0	0.0
2304	No	Female	56+	USA	DESPAIR	DESPAIR	JOY	DESPAIR	DESPAIR	DESPAIR	...	DESPAIR	MEH	DESPAIR	JOY	White and gold	Sunday	1.0	0.0	0.0	0.0

	Internal ID	Q1: GOING OUT?	Q2: GENDER	Q3: AGE	Q4: COUNTRY	Q5: STATE, PROVINCE, COUNTY, ETC	Q6 \| 100 Grand Bar	Q6 \| Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)	Q6 \| Any full-sized candy bar	Q6 \| Black Jacks	...	Q9: OTHER COMMENTS	Q10: DRESS	Unnamed: 113	Q11: DAY	Q12: MEDIA [Daily Dish]	Q12: MEDIA [Science]	Q12: MEDIA [ESPN]	Q12: MEDIA [Yahoo]	Click Coordinates (x, y)	Q6_new \| Hershey's Milk Chocolate
2305	90291228.0	Yes	Female	48	USA	KS	JOY	MEH	JOY	DESPAIR	...	Smarties (Commonwealth) are awesome. Sweetums ...	Blue and black	NaN	Friday	1.0	NaN	NaN	NaN	(31, 5)	NaN
2306	90291230.0	No	Female	60	United States	California	JOY	DESPAIR	JOY	MEH	...	NaN	Blue and black	NaN	Friday	NaN	1.0	NaN	NaN	(58, 6)	NaN
2307	90291318.0	NaN	Male	53	United States	Illinois	MEH	DESPAIR	JOY	MEH	...	NaN	White and gold	NaN	Friday	NaN	1.0	NaN	NaN	(80, 31)	NaN
2308	90291319.0	No	Male	37	us	IL	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2309	90291458.0	Yes	Male	42	USA	OH	JOY	DESPAIR	JOY	MEH	...	NaN	White and gold	NaN	Friday	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2474	90314359.0	No	Male	24	USA	MD	JOY	DESPAIR	MEH	DESPAIR	...	NaN	White and gold	NaN	Friday	NaN	NaN	NaN	NaN	NaN	NaN
2475	90314580.0	No	Female	33	USA	New York	MEH	DESPAIR	JOY	NaN	...	NaN	Blue and black	NaN	Friday	NaN	1.0	NaN	NaN	(70, 26)	NaN
2476	90314634.0	No	Female	26	USA	Tennessee	MEH	DESPAIR	JOY	DESPAIR	...	NaN	Blue and black	NaN	Friday	NaN	1.0	NaN	NaN	(67, 35)	NaN
2477	90314658.0	No	Male	58	Usa	North Carolina	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2478	90314802.0	No	Female	66	usa	Pennsylvania	DESPAIR	DESPAIR	JOY	DESPAIR	...	You hit all my chocolate highlights, and broug...	White and gold	NaN	Sunday	1.0	NaN	NaN	NaN	(19, 26)	NaN

	index	0	1	2	3	4	5	6	7	8	...	2295	2296	2297	2298	2299	2300	2301	2302	2303	2304
0	Q6 \| 100 Grand Bar	MEH	NO_ANSWER	MEH	JOY	JOY	NO_ANSWER	JOY	JOY	MEH	...	NO_ANSWER	NO_ANSWER	JOY	MEH	JOY	JOY	MEH	MEH	NO_ANSWER	DESPAIR
1	Q6 \| Anonymous brown globs that come in black ...	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	...	NO_ANSWER	NO_ANSWER	DESPAIR	DESPAIR	MEH	DESPAIR	DESPAIR	DESPAIR	NO_ANSWER	DESPAIR
2	Q6 \| Any full-sized candy bar	JOY	NO_ANSWER	JOY	JOY	JOY	NO_ANSWER	JOY	JOY	JOY	...	NO_ANSWER	NO_ANSWER	JOY	JOY	JOY	MEH	JOY	JOY	NO_ANSWER	JOY
3	Q6 \| Black Jacks	MEH	NO_ANSWER	MEH	DESPAIR	NO_ANSWER	NO_ANSWER	DESPAIR	MEH	MEH	...	NO_ANSWER	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	DESPAIR	NO_ANSWER	DESPAIR	NO_ANSWER	DESPAIR
4	Q6 \| Bonkers (the candy)	DESPAIR	NO_ANSWER	MEH	MEH	NO_ANSWER	NO_ANSWER	DESPAIR	MEH	MEH	...	NO_ANSWER	NO_ANSWER	MEH	MEH	MEH	DESPAIR	NO_ANSWER	MEH	NO_ANSWER	DESPAIR
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
98	Q6 \| Vicodin	DESPAIR	NO_ANSWER	JOY	JOY	DESPAIR	NO_ANSWER	MEH	DESPAIR	DESPAIR	...	NO_ANSWER	NO_ANSWER	JOY	DESPAIR	MEH	JOY	NO_ANSWER	JOY	NO_ANSWER	JOY
99	Q6 \| Whatchamacallit Bars	DESPAIR	NO_ANSWER	JOY	JOY	JOY	NO_ANSWER	DESPAIR	MEH	MEH	...	NO_ANSWER	NO_ANSWER	MEH	NO_ANSWER	JOY	DESPAIR	JOY	MEH	NO_ANSWER	DESPAIR
100	Q6 \| White Bread	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	...	NO_ANSWER	NO_ANSWER	DESPAIR	DESPAIR	MEH	MEH	DESPAIR	DESPAIR	NO_ANSWER	MEH
101	Q6 \| Whole Wheat anything	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	NO_ANSWER	DESPAIR	DESPAIR	DESPAIR	...	NO_ANSWER	NO_ANSWER	MEH	DESPAIR	MEH	DESPAIR	MEH	DESPAIR	NO_ANSWER	DESPAIR
102	Q6 \| York Peppermint Patties	DESPAIR	NO_ANSWER	DESPAIR	JOY	JOY	NO_ANSWER	DESPAIR	DESPAIR	MEH	...	NO_ANSWER	NO_ANSWER	MEH	JOY	JOY	MEH	JOY	MEH	NO_ANSWER	JOY

	joy_count	despair_count	net_feelies
2	1477	15	1462
75	1416	88	1328
44	1367	47	1320
14	1363	63	1300
96	1339	67	1272
...	...	...	...
20	84	1356	-1272
74	86	1398	-1312
100	43	1376	-1333
28	43	1386	-1343
8	24	1535	-1511

Data Cleaning¶

Halloween Candy¶

Overview and End Goal¶

Initial Import & Exploration¶

Data Cleaning¶

Before proceeding to next, we assign a new DataFrame: candy2 for remaining columns and then next we combine candy1 and cand 2 DataFrames to get new modified candy dataset¶

Now, let's look at the very messy Q4: COUNTRY column and see what we can do about it. First, run the code below to look at the different unique values in the data.¶

We now want to look at the Q3: AGE column. Let's look at all the unique values.¶

Data Analysis¶

Encoding DataFrame¶

Final Analysis¶

Before proceeding to next, we assign a new DataFrame: `candy2` for remaining columns and then next we combine `candy1` and `cand 2` DataFrames to get new modified `candy` dataset¶

Now, let's look at the very messy `Q4: COUNTRY` column and see what we can do about it. First, run the code below to look at the different unique values in the data.¶

We now want to look at the `Q3: AGE` column. Let's look at all the unique values.¶